Search CORE

19 research outputs found

Efficient GPU Offloading with OpenMP for a Hyperbolic Finite Volume Solver on Dynamically Adaptive Meshes

Author: Baboulin M.
Bader M
Bhatele A.
Brito Gadeschi G
Hammond J.
Kruse C.
Weinzierl T
Wille M
Publication venue: Springer Verlag
Publication date: 10/05/2023
Field of study

We identify and show how to overcome an OpenMP bottleneck in the administration of GPU memory. It arises for a wave equation solver on dynamically adaptive block-structured Cartesian meshes, which keeps all CPU threads busy and allows all of them to offload sets of patches to the GPU. Our studies show that multithreaded, concurrent, non-deterministic access to the GPU leads to performance breakdowns, since the GPU memory bookkeeping as offered through OpenMP’s map clause, i.e., the allocation and freeing, becomes another runtime challenge besides expensive data transfer and actual computation. We, therefore, propose to retain the memory management responsibility on the host: A caching mechanism acquires memory on the accelerator for all CPU threads, keeps hold of this memory and hands it out to the offloading threads upon demand. We show that this user-managed, CPU-based memory administration helps us to overcome the GPU memory bookkeeping bottleneck and speeds up the time-to-solution of Finite Volume kernels by more than an order of magnitude

Durham Research Online

A Fast GPU-accelerated Mixed-precision Strategy for Fully NonlinearWater Wave Computations

Author: A Buttari
A P Engsig-Karup
AP Engsig-Karup
B Li
H B Bingham
M Baboulin
R S Martin
Publication venue: 'University of Leicester'
Publication date: 01/01/2011
Field of study

Crossref

Online Research Database In Technology

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

Author: A. Buttari
A. Frommer
D. Chazan
D. Göddeke
H. Anzt
M. Baboulin
U. Aydin
Z.-Z. Bai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures

Crossref

KITopen

A partial condition number for linear least squares problems

Author: Arioli M
Baboulin M
Gratton S
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2004
Field of study

ePubs: the open archive for STFC research publications

Statistical estimates for the conditioning of linear least squares problems

Author: AJ Geurts
CC Paige
CS Kenney
CS Kenney
E Anderson
F Cucker
GH Golub
GH Golub
Higham NJ
J Rice
JH Wilkinson
L Eldén
M Arioli
M Baboulin
M Baboulin
M Baboulin
M Baboulin
P-Å Wedin
S Chandrasekaran
S Gratton
S Tomov
T Gudmundsson
Y Cao
Å Björck
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2013
Field of study

International audienceIn this paper we are interested in computing linear least squares (LLS) condition numbers to measure the numerical sensitivity of an LLS solution to perturbations in data. We propose a statistical estimate for the norm-wise condition number of an LLS solution where perturbations on data are measured using the Frobenius norm for matrices and the Euclidean norm for vectors. We also explain how condition numbers for the components of an LLS solution can be computed. We present numerical experiments that compare the statistical condition estimates with their corresponding exact values

HAL-CentraleSupelec

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-Rennes 1

A floating point conversion algorithm for mixed precision computations

Author: D. Goddeke
H.M. Hasanien
I. Spence
M. Baboulin
V. Groza
Publication venue: 'Zhejiang University Press'
Publication date
Field of study

Crossref

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods

Author: A George
D Becker
G Bosilca
IS Duff
IS Duff
JW Demmel
L Blackford
M Baboulin
M Baboulin
NJ Higham
O Schenk
PR Amestoy
S Tomov
XS Li
Publication venue
Publication date: 21/02/2014
Field of study

Abstract. We consider the solution of sparse linear systems using direct methods via LU factorization. Unless the matrix is positive definite, numerical pivoting is usually needed to ensure stability, which is costly to implement especially in the sparse case. The Random Butterfly Transformations (RBT) technique provides an alternative to pivoting and is easily parallelizable. The RBT transforms the original matrix into another one that can be factorized without pivoting with probability one. This approach has been successful for dense matrices; in this work, we investigate the sparse case. In particular, we address the issue of fill-in in the transformed system.

HAL-CentraleSupelec

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs

Author: A Greenbaum
E Agullo
E Carson
E Carson
M Baboulin
S Tomov
V Simoncini
Y Saad
Y Saad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/06/2020
Field of study

Crossref

The University of Manchester - Institutional Repository

The Design of Fast and Energy-Efficient Linear Solvers: On The potential Of Half Precision Arithmetic And Iterative Refinement Techniques

Author: E Carson
H Jagode
J Eastep
JH Wilkinson
M Baboulin
M Etinski
NJ Higham
R Ge
RD Skeel
S Tomov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Crossref

The University of Manchester - Institutional Repository

A Fast GPU-Accelerated Mixed-Precision Strategy for Fully Nonlinear Water Wave Computations

Author: A Buttari
A P Engsig-Karup
AP Engsig-Karup
B Li
H B Bingham
M Baboulin
R S Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref